npj Precision Oncology
○ Springer Science and Business Media LLC
Preprints posted in the last 30 days, ranked by how well they match npj Precision Oncology's content profile, based on 14 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit.
Leyva, A.; Akbar, A.; Niazi, K.
Show abstract
Molecular subtyping of cancer is traditionally defined in transcriptomic space, yet routine clinical deployment is limited by the availability and cost of sequencing. Meanwhile, histopathology captures rich morphological information that is known to correlate with molecular state but lacks a principled, mechanistic bridge to gene-level representations. We propose a graph-constrained learning framework that aligns morphology-derived signals with a fixed, data-driven gene network discovered via hierarchical Monte Carlo screening. We can derive new gene sets for classification using random sampling, and use the coexpression network of that graph to enforce the learning of a pure morphology model without using gene expression. The resulting model performs subtype prediction using morphology alone, while being explicitly forced to operate through a gene-structured latent space. Structural alignment is enforced during training. For Moffitt classification in pancreatic cancer using PANCAN and TCGA datasets, the model has a reported 85% AUC using an alternative gene set network structure, while the alternate gene set itself has an 84% AUC in all patients that were classified with subtyping with pancreatic cancer in the dataset. This demonstrates that virtual transcriptomics can provide biologically grounded molecular insights using only routine histopathology slides, potentially expanding access to precision oncology in resource-limited settings.
Maitra, C.; Das, V.; Seal, D. B.; De, R. K.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWLung cancer is characterized by profound intratumoral and inter-patient heterogeneity, spanning histological subtypes, molecular landscapes, and the tumor microenvironment. While multi-omics integration is essential for capturing this complexity, leveraging these data to explicitly define survival-associated subpopulations remains a significant challenge. In this study, we developed NeuroMDAVIS-FS, an unsupervised deep learning framework designed to stratify lung cancer patients by survival risk, and identify molecular determinants underlying improved clinical outcomes. Using the CPTAC cohort, we integrated genomic (CNV), transcriptomic (RNA-seq), and proteomic profiles to extract modality-specific features. Candidate biomarkers were validated through Kaplan- Meier (KM) survival analysis and univariate Cox proportional hazards (CoxPH) regression. A final multivariate CoxPH model effectively stratified patients into high-risk and low-risk cohorts (Kaplan Meier p-value < 0.001). Notably, the integration of these molecular features with baseline clinical models significantly enhanced prognostic accuracy, improving the concordance index by 43.79% in LUAD, 31.05% in LSCC, and 23.76% across the pan-lung cancer cohort. These results demonstrate that NeuroMDAVIS-FS identifies robust, biologically relevant features that surpass traditional clinical variables in predicting patient outcomes, offering a scalable path for precision oncology.
Salome, P.; Knoll, M.; Walz, D.; Cogno, N.; Dedeoglu, A. S.; Qi, A. L.; Isakoff, S. J.; Abdollahi, A.; Jimenez, R. B.; Bitterman, D. S.; Paganetti, H.; Chamseddine, I.
Show abstract
Introduction: Manual data extraction from unstructured clinical notes is labor-intensive and impractical for large-scale clinical and research operations. Existing automated approaches typically require large language models, dedicated computational infrastructure, and/or task-specific fine-tuning that depends on curated data. The objective of this study is to enable accurate extraction with smaller locally deployed models using a disease-site specific pipeline and prompt configuration that are optimized and reusable. Materials/Methods: We developed OncoRAG, a four-phase pipeline that (1) generates feature-specific search terms via ontology enrichment, (2) constructs a clinical knowledge graph from notes using biomedical named entity recognition, (3) retrieves relevant context using graph-diffusion reranking, and (4) extracts features via structured prompts. We ran OncoRAG using Microsoft Phi-3-medium-instruct (14B parameters), a midsize language model deployed locally via Ollama. The pipeline was applied to three cohorts: triple-negative breast cancer (TNBC; npatients=104, nfeatures=42; primary development), recurrent high-grade glioma (RiCi; npatients=191, nfeatures=19; cross-lingual validation in German), and MIMIC-IV (npatients=100, nfeatures=10; external testing). Downstream task utility was assessed by comparing survival models for 3-year progression-free survival built from automatically extracted versus manually curated features. Results: The pipeline achieved mean F1 scores of 0.80 +/- 0.07 (TNBC; npatients=44, nfeatures=42), 0.79 +/- 0.12 (RiCi; npatients=61, nfeatures=19), and 0.84 +/- 0.06 (MIMIC-IV; npatients=100, nfeatures=10) on test sets under the automatic configuration. Compared to direct LLM prompting and naive RAG baselines, OncoRAG improved the mean F1-score by 0.19 to 0.22 and 0.17 to 0.19, respectively. Manual configuration refinement further improved the F1-score to 0.83 (TNBC) and 0.81 (RiCi), with no change in MIMIC-IV. Extraction time averaged 1.7-1.9 seconds per feature with the 14B model. Substituting a smaller 3.8B model reduced extraction time by 57%, with a decrease in F1-score (0.03-0.10). For TNBC, the extraction time was reduced from approximately two weeks of manual abstraction to under 2.5 hours. In an exploratory survival analysis, models using automatically extracted features showed a comparable C-index to those with manual curation (0.77 vs 0.76; 12 events). Conclusions: OncoRAG, deployed locally using a mid-size language model, achieved accurate feature extraction from multilingual oncology notes without fine-tuning. It was validated against manual extraction for both retrieval accuracy and survival model development. This locally deployable approach, which requires no external data sharing, addresses a critical bottleneck in scalable oncology research.
Ugwueke, E. C.; Azzam, M.; Zhou, M.; Teply, B. A.; Bergan, R. C.; Wan, S.; Fojo, A. T.; Leuva, H.; Wang, J.
Show abstract
BackgroundOnce the treatment starts, early prediction of treatment benefit and its correlation with overall survival (OS) remains challenging in metastatic castration-resistant prostate cancer (mCRPC). Existing prognostic models require long-term follow-up, limiting their ability to inform timely treatment decisions. To address this gap, we evaluated tumor growth rate (g-rate)-based survival models across multiple treatment lines to assess their ability to predict OS and support early clinical decision-making. MethodsWe developed GxSurv, a Random Survival Forest (RSF)-based framework that incorporates baseline clinical variables and g-rate calculated from serial on-treatment PSA, to construct line-specific prediction models of OS, a direct measure of treatment outcome. Three variants were developed: G3Surv, using the 3-month g-rate; G6Surv, using the 6-month g-rate; and GfSurv, using the final observed g-rate. Model performance was evaluated using Harrells C-index, Unos C-index, Integrated Brier Score (IBS), time-dependent area under the curve (tAUC). Model interpretability was assessed using permutation importance to quantify predictor contributions within the GxSurv framework. FindingsThe study included 15912 treatment records from 11014 patients with mCPRC across four lines of therapy. We found that incorporation of g-rate consistently improved model performance across all treatment lines, with all GxSurv models outperforming Cox proportional hazards (CoxPH). As the earliest prognostic model, our G3Surv demonstrated strong early predictive performance, with Harrells C-index values ranging from 0{middle dot}700 to 0{middle dot}746 and tAUC values of 0{middle dot}766 to 0{middle dot}822 across all lines, representing 5-8% and 4-5% improvements over CoxPH, respectively. These results indicate that G3Surv accurately predicts individual treatment outcomes at 3 months after treatment initiation. Feature importance analyses consistently identified g-rate as a top predictor, followed by baseline PSA and hemoglobin, with relative variation across treatment lines. InterpretationIntegrating g-rate calculated from on-treatment PSA values enables accurate, line-specific prediction of treatment outcomes in mCRPC, with the 3-month g-rate providing robust early prognostic information to support timely, personalized clinical decision-making. FundingU.S. National Science Foundation, National Institutes of Health, American Cancer Society.
Sanjaya, P.; Pitkänen, E.
Show abstract
Tumour typing from whole-genome sequencing is increasingly accurate, yet molecular subtyping from somatic variants remains challenging because of tumour heterogeneity and inconsistent clinical annotations. Here, we present Mutation-Attention Dual-Task (MuAt2), a Transformer model that jointly classifies histological tumour types and subtypes directly from somatic single-nucleotide variants, indels and structural variants. MuAt2 leverages encoders pre-trained on 2,587 pan-cancer whole genomes, and subsequently fine-tuned and evaluated on 14,527 tumour whole genomes from Genomics England spanning 15 tumour types and 68 subtypes. MuAt2 outperformed aggregated-feature deep baselines and conventional machine learning models. Fine-tuning improved both accuracy and calibration across independent cohorts processed with heterogeneous variant-calling pipelines. MuAt2 embeddings organised tumours by lineage and oncogenic processes, captured molecular subtype-defining driver events and improved prognostic stratification in gliomas. Finally, MuAt2 facilitated interpretation of metastatic tumours and cancers of unknown primary by inferring plausible tissue origins from somatic variant patterns. In conclusion, MuAt2 provides a transferable and interpretable modelling framework for cancer diagnosis and prognosis directly from whole-genome somatic variation.
Gallifant, J.; Chen, S.; Shin, K.-Y.; Kellogg, K. C.; Doyle, P. F.; Guo, J.; Ye, B.; Warrington, A.; Zhai, B. K.; Hadfield, M. J.; Gusev, A.; Ricciuti, B.; Christiani, D. C.; Aerts, H. J.; Kann, B. H.; Mak, R. H.; Nelson, T. L.; Nguyen, P.; Schoenfeld, J. D.; Topaloglu, U.; Catalano, P.; Hochheiser, H. H.; Warner, J. L.; Sharon, E.; Kozono, D. E.; Savova, G. K.; Bitterman, D.
Show abstract
Immune-related adverse events (irAEs) affect up to 40% of patients receiving immune checkpoint inhibitors, yet their identification depends on laborious and inconsistent manual chart review. Here we developed and evaluated an agentic large language model system to extract the presence, temporality, severity grade, attribution, and certainty of six irAE types from clinical notes. Retrospectively (263 notes), the system achieved macro-averaged F1 of 0.92 for detection and 0.66 for multi-class severity grading; self-consistency improved F1 by 0.14. The best-performing configuration cost approximately $0.02 per note. In prospective silent deployment over three months (884 notes), detection F1 was 0.72-0.79. In a randomized crossover study of clinical trial staff (17 participants, 316 observations), agentic assistance reduced annotation time by 40% (P < 0.001), increased complete-match accuracy (OR 1.45; 95% CI 1.01-2.09; P = 0.045), and improved inter-annotator agreement (Krippendorffs from 0.22-0.51 to 0.82-0.85). These results demonstrate that agentic AI coupled with human verification could enhance efficiency, performance, and consistency for irAE assessment.
Guler, F.; Goksuluk, D.; Xu, M.; Choudhary, G.; agraz, m.
Show abstract
Applying deep learning models to RNA-Seq data poses substantial challenges, primarily due to the high dimensionality of the data and the limited sample sizes. To address these issues, this study introduces an advanced deep learning pipeline that integrates feature engineering with data augmentation. The engineering application focuses on biomedical engineering, specifically the classification of RNA-Seq datasets for disease diagnosis. The proposed framework was initially validated on synthetic datasets generated from Naive Bayes, where MLP-based augmentation yielded a notable improvement in predictive performance. Building on this foundation, we applied the approach to chromophobe renal cell carcinoma (KICH) RNA-Seq data from The Cancer Genome Atlas (TCGA). Following standard preprocessing steps normalization, transformation, and dimensionality reduction, the analysis concentrated on three main aspects: augmentation strategies, preprocessing methods, and explainable AI (XAI) techniques in relation to classification outcomes. Feature selection was performed through PCA, Boruta, and RF-based methods. Three augmentation strategies linear interpolation, SMOTE, and MixUp were evaluated. To maintain methodological rigor, augmentation was applied exclusively to the training set, while the test set was held out for unbiased evaluation. Within this framework, we conducted a comparative assessment of multiple deep learning architectures, including MLP, GNN, and the recently proposed Kolmogorov-Arnold networks (KAN). The GNN achieved the highest classification accuracy (99.47%) when trained with MixUp augmentation combined with RF feature selection, and achieved the best F1 score (0.9948). Consequently, the GNN-based XAI framework was applied to the RF dataset enriched with MixUp. XAI analyses identified the top 20 most influential genes, such as HNF4A, DACH2, MAPK15, and NAT2, which played the greatest role in classification, thereby confirming the biological plausibility of the model outputs. To further validate model robustness, cervical cancer and Alzheimers RNA-Seq datasets were also tested, yielding consistent and reliable results. Overall, the findings highlight the value of incorporating data augmentation into deep learning models for RNA-Seq analysis, not only to improve predictive performance but also to enhance biological interpretability through explainable AI approaches.
Gainullin, V. G.; Gray, M.; Kumar, M.; Luebker, S.; Lehman, A. M.; Choudhry, O. A.; Roberta, J.; Flake, D. D.; Shanmugam, A.; Cortes, K.; Chang, E.; Uren, P. J.; Mazloom, A.; Garces, J.; Silvestri, G. A.; Chesla, D. W.; Given, R. W.; Beer, T. M.; Diehl, F.
Show abstract
Multi-cancer early detection (MCED) tests can detect several cancer types and stages. We previously developed a methylation and protein (MP V1) MCED classifier. In this study, we present a refined MP V2 classifier, developed by evaluating model architectures that improved performance in prospectively enrolled case-control cohorts under standard testing conditions. The newly developed MP V2 classifier was trained to be more generalizable and achieve increased early-stage sensitivity at a target specificity of [≥]97.0%. MP V1 and MP V2 classifier performances were compared using a previously described test set, and MP V2 performance was also evaluated in a new independent clinical validation set. Compared to MP V1, the MP V2 classifier demonstrated a 7.3% increase in overall sensitivity, with sensitivity increases of 7.6%, 9.2%, and 8.3% for stages I, II, and stages I/II, respectively, in the intended use (breast and prostate cancers excluded) test set. In an independent validation intended use set, the MP V2 classifier showed an overall sensitivity of 55.6%, with sensitivities of 26.8%, 42.9%, and 34.8% for stages I, II, and stages I/II, respectively. In a case-control setting, the MP V2 classifier offered improved sensitivity for early-stage cancers at a lower specificity target.
Reinosa, R.
Show abstract
IntroductionThe precise determination of diagnostic cut-off points is essential for the development of multimarker panels in oncology. In previous work on pulmonary nodules, it was observed that the standard two-parameter logistic fit could be insufficient for biomarkers with asymmetric distributions. Furthermore, the calculation of empirical cut-off points based on graphical visualization presented limitations in precision and reproducibility. ObjectiveThis study presents a methodological advancement in the data analysis phase (Stage 1), introducing new Python algorithms for the direct analytical calculation of empirical intersections and robust mathematical modeling using Dual Annealing with both two-parameter and four-parameter logistic functions. This improved methodology feeds into the ThresholdXpert 1.0 software tool for combinatorial optimization of biomarker panels (Stage 2), and is applied here to the diagnostic challenge of hepatocellular carcinoma (HCC). MethodsThe methodology was first validated by re-analyzing a dataset of patients with pulmonary nodules (N=895). It was subsequently applied to an HCC dataset derived from the cohort of Jang et al. (208 HCC, 193 cirrhosis, 401 total), randomly divided into a training set (280) and an independent test set (121). Scripts were developed to compare the previous two-parameter logistic fit with the new two- and four-parameter logistic models. Finally, ThresholdXpert 1.0 was used for multimarker panel optimization. ResultsThe integration of empirical calculation, logistic modeling, and combinatorial optimization through ThresholdXpert 1.0 provides a robust and coherent framework for the development of multimarker diagnostic panels. The four-parameter logistic model provided additional validation without substantially modifying cut-off values for most biomarkers, confirming the stability of the approach while offering greater flexibility for complex distributions. When applied to hepatocellular carcinoma, the framework identified a molecular panel composed of AFP, PIVKA-II, OPN, and DKK-1 with sensitivity of 0.77 and specificity of 0.72, and an optimized panel incorporating inverse MELD that achieved the best overall balance (sensitivity 0.73, specificity 0.75) in independent external validation. These results demonstrate the potential of this approach as a generalizable tool for the optimized design of binary diagnostic systems in oncology. ConclusionThe integration of complementary mathematical modeling enhances the capability of ThresholdXpert 1.0 to identify robust diagnostic panels, as in some cases a single biomarker may outperform biomarker combinations, and vice versa. This approach enabled the integration of molecular biomarkers and clinical variables under a unified mathematical framework. Contactroberto117343@gmail.com
Makani, A.
Show abstract
Medical oncology education faces a dual crisis: knowledge velocity that outpaces static curricula and large language model (LLM) risks--hallucination and automation bias--that threaten the fidelity of AI-assisted learning. We present Onco-Shikshak V7, an AI-native adaptive learning platform that addresses both challenges through a unified cognitive architecture grounded in learning science. The system replaces isolated educational modules with four authentic clinical workflows--Morning Report, Tumor Board, Clinic Day, and AI Textbook--each scaffolded by a nine-module pedagogy engine that integrates ACT-R activation dynamics (illness scripts), Item Response Theory (adaptive difficulty), the Free Spaced Repetition Scheduler (FSRS v4), Zone of Proximal Development (scaffolding), and metacognitive calibration training (Brier score). Six specialist AI agents--medical oncology, radiation oncology, surgical oncology, pathology, radiology, and oncology navigation--engage in multi-disciplinary deliberation with per-specialty retrieval-augmented generation (RAG) grounding across nine authoritative guideline sources including NCCN, ESMO, and ASTRO. The platform provides 18 clinical cases with decision trees across six cancer types, maps every interaction to 13 ACGME Hematology-Oncology milestones, and implements four closed-loop feedback mechanisms that connect session errors to targeted flashcards, weak domains to suggested cases, and all interactions to a persistent learner profile. Technical validation confirms algorithmic correctness across eight subsystems. To our knowledge, this is the first system to unify ACT-R, IRT, FSRS, ZPD, and metacognitive calibration in a single medical education platform. Formal learner evaluation via randomized controlled trial is planned.
Sun, Y.; Chang, S.; Tang, K.; LeBlanc, M. R.; Palmer, A. C.; Ahamadi, M.; Zhou, J.
Show abstract
BackgroundIn immune checkpoint inhibitor (ICI) trials, overall survival (OS) benefits are well established, yet improvements in quality of life (QoL) are often inconsistent or absent in conventional analyses. This apparent discordance raises important questions: are QoL outcomes truly unrelated to survival, and how can QoL results be better utilized and interpreted? MethodsA model-based meta-analysis (MBMA) of longitudinal EORTC QLQ-C30 global health status/quality of life data from randomized ICI trials was conducted. Longitudinal QoL trajectories were analyzed using a nonlinear mixed-effects model to estimate treatment-related toxicity and long-term QoL improvement. Associations between QoL trajectory parameters and OS were assessed using spearman rank correlation tests and Cox proportional hazards models. ResultsTwenty-seven studies (8,149 ICI and 5,593 control patients) contributed longitudinal QoL data, and 18 studies provided matched OS data. Raw QoL trajectories showed overlap between treatment arms, while OS consistently favored ICIs. MBMA revealed that ICIs had similar toxicity but significantly faster QoL improvement than control therapies (p < 0.0001). Baseline QoL, toxicity, and QoL improvement rate were all significantly associated with OS (p < 0.001). MBMA-based QoL comparisons were more sensitive in detecting associations with survival than raw QoL data, with the strongest association observed at Week 24 (R = -0.37, p = 0.067). ConclusionsConventional analyses comparing QoL at a single time point may obscure meaningful patient-reported benefits. By capturing longitudinal QoL trajectories across trials, MBMA reveals how patient experience evolves alongside survival outcomes and supports improved interpretation and utilization of QoL data in treatment evaluation.
Meyer, L.; Engler, S.; Lutz, M.; Schraml, P.; Rutishauser, D.; Bertolini, A.; Lienhard, M.; Beisel, C.; Singer, F.; De Souza, N.; Beerenwinkel, N.; Moch, H.; Bodenmiller, B.
Show abstract
Clear cell renal cell carcinoma (ccRCC) is the leading cause of kidney cancer-related death, but how the tumor microenvironment shapes patient survival is not completely understood. Here, we describe the characterization of ccRCC tumor ecosystems from 498 patients using imaging mass cytometry with a focus on tumor, myeloid, and T cell landscapes. Data from more than 3 million single cells is analyzed using machine-learning to identify key ecosystem features that outperform basic clinical data for predicting patient survival. We define three survival ecotypes of ccRCC: Poor ecotypes, correlate with the worst survival, have high levels of ICAM1 and CD44 expression in tumor cells and are enriched in M2-like macrophages and interactions of exhausted CD8+ T cells with macrophages. Favorable ecotypes are characterized by high levels of VHL on tumor cells and of HLADR on myeloid cells and contain Th1-like CD4+ T cells. Medium ecotypes have the highest endothelial cell density and various immune-to-tumor interactions. Multi-omic characterization of these ecotypes using targeted genomic sequencing and metabolic imaging reveals distinct genomic and metabolic features, including BAP1 mutations in Poor and VHL monodriver/wild-type status in Favorable patients. We show that deep learning allows ecotype prediction directly from standard pathology H&E images. We validate the ecotypes and their associated molecular characteristics with orthogonal omics data across five clinical cohorts and more than 2,500 patients. These analyses highlight an overall survival benefit for Medium patients treated with immunotherapy. In summary, our study distills the survival-relevant information encoded in the ccRCC tumor microenvironment into prognostic survival ecotypes, which may inform clinical decision making in the future.
Niknafs, N.; Sivapalan, L.; Balan, A.; Wehr, J.; Pereira, G.; Hosseini-Nami, S.; Rao, N.; Jolly, S.; Velliangiri, K.; Beadles, I.; Loftus, T.; Chesnick, B.; Medina, J.; Xiao, W.; Pabani, A.; Marrone, K. A.; Li, Q. K.; Murray, J. C.; Rinaldi, L.; Dracopoli, N. C.; Sausen, M.; Hann, C. L.; Scott, S. C.; Feliciano, J.; Lam, V. K.; Levy, B.; Velculescu, V. E.; Brahmer, J. R.; Forde, P. M.; Vellanki, P. J.; Anagnostou, V.
Show abstract
PurposeCirculating tumor DNA (ctDNA) analyses are informative as an early indicator of immunotherapy response in advanced non-small cell lung cancer (NSCLC); however, the clinical value of ctDNA molecular response requires further validation. Patients and MethodsAs part of a prospective clinical protocol (NCT05995821), we conducted targeted error-correction sequencing of ctDNA (n=328) and matched WBC DNA (n=109) from 109 patients with metastatic NSCLC who received anti-PD-(L)1 either as monotherapy or in combination. Following cellular origin resolution of 2,818 variants, landmark molecular response (mR) was defined as undetectable ctDNA within 3-9 weeks of treatment initiation. ResultsPre-treatment ctDNA burden, but not blood tumor mutation burden, predicted survival. Implementing a tumor-naive WBC DNA-informed approach increased the number of evaluable cases without compromising the overall accuracy of landmark ctDNA molecular responses. A direct comparison of single-timepoint on-therapy ctDNA assessment with ctDNA dynamics from baseline to the 3-9-week interval, along with an analysis of heterogeneity in molecular response within the 3-9-week window, showed that undetectable ctDNA at the landmark timepoint can effectively predict survival outcomes. A significant enrichment in landmark ctDNA mR was noted among patients with progression-free survival (PFS) [≥]6 months with immunotherapy (p=2.5e-05) and chemo-immunotherapy (p=0.02). Patients in the landmark mR group had longer progression-free (p=1.6e-06) and overall survival (p=2.5e-05) than those with molecular progression. ConclusionsLandmark ctDNA molecular response provides a real-time, accurate approach for monitoring immunotherapy clinical outcomes. Although not currently validated for regulatory use, these findings demonstrate the potential utility of ctDNA as an early endpoint in clinical trials. Translational RelevanceEmploying circulating tumor DNA (ctDNA) dynamics as an early indicator of immunotherapy response requires a roadmap for the next-generation sequencing approach, definition of molecular response and establishment of its clinical sensitivity. In this study, we introduce the concept of a landmark ctDNA molecular response, determined 3-9 weeks after initiation of immunotherapy, that maximizes the number of evaluable patients without sacrificing the specificity of the approach. Notably, when evaluating heterogeneity in ctDNA detection within the landmark 3-9-week window and assessing the impact of landmark interval dynamics on survival, we found that a single ctDNA assessment performed similarly to multiple ctDNA measurements within the landmark window (most notably, regardless of whether the timepoints were concordant or discordant). Our findings demonstrate that a single assessment of early on-therapy landmark ctDNA molecular response, can identify patients at risk of disease progression and enable future intervention and therapy optimization.
Huang, C. Z.; Ching-Roa, V. D.; Heckman, C. M.; Mould, K.; Sipprell, W. H.; Smoller, B. R.; Ibrahim, S. F.; Giacomelli, M. G.
Show abstract
Cutaneous squamous cell carcinoma (SCC) can be time-consuming to treat with Mohs micrographic surgery (MMS) due to the need for intraoperative frozen section (FS) preparation. Two-photon fluorescence microscopy (TPFM) can generate H&E-equivalent images from fresh tissue specimens in a fraction of this time. To determine the accuracy of TPFM for the evaluation of squamous cell carcinoma in MMS margins compared to conventional FS Mohs slide preparation. TPFM was used to image 144 first stage MMS margins from patients being treated for SCC. A Mohs surgeon reviewed 44 training images and then evaluated 100 margins. After a delay, the same surgeon evaluated the corresponding FS slides. Pairs of TPFM and FS slides were reviewed by an expert dermatopathologist to form a consensus diagnosis. Agreement with consensus diagnosis as assessed by an independent dermatopathologist. 3 margins (3%) unequivocally disagreed with the consensus on TPFM and 2 margins (2%) disagreed on FS. The sensitivity and specificity of TPFM were 95.1% and 98.2%, respectively. This study demonstrates that slide-free histology can be interpreted equivalently to conventional Mohs slide processing by both MMS surgeons and dermatopathologists with minimal training.
Nishiyama, N.
Show abstract
Immunotherapy with immune checkpoint inhibitors and immunotherapy combined with chemotherapy have represented promising treatments for NSCLC patients leading to prolonged survival. However, the majority of patients with advanced NSCLC have a poor prognosis. The identification and development of biomarkers for stratifying responders and non responders to immune checkpoint inhibitors contribute to unravel the mechanism of immune checkpoint pathway and the immune tumor interaction underlying the responses and are urgently needed to improve clinical outcomes of immune checkpoint inhibitor treatment. In this study, we analyzed the clinical and gene mutation data of NCSLC patients treated with nivolumab containing immunotherapy or nivolumab containing immunotherapy combined with chemotherapy (the immunotherapy treated group, n=119) and chemotherapy alone (the chemotherapy alone treated group, n=991) extracted from the MSK CHORD dataset. A DeevSurv model, a deep learning based extension of the Cox proportional hazards model was trained to generate survival risk score of each patient with binary statuses of thirty one gene mutations as input features into the model. The thirty one genes were selected based on population level mutation frequency, patient level variance in mutation status, and univariate Cox proportional hazards analyses evaluating the association between the presence or absence of each gene mutation and overall survival. The performance of the trained DeepSurv model was evaluated on the test set of the immunotherapy treated group using the concordance indexes (C index). The trained model was subsequently applied without retraining to the entire chemotherapy alone treated group as a control. The resulting C indexes for the immunotherapy treated group and chemotherapy alone treated group were 0.789 and 0.483, respectively. All patients within each group were divided into high and low risk groups according to the median predicted risk score. Kaplan Meier survival curves of high and low risk groups (n=43 vs n=70) in the immunotherapy treated group revealed a significant separation (log rank p<0.001), whereas no separation was observed in chemotherapy alone treated group (p=0.62). In the combined cohort of the immunotherapy treated group and chemotherapy alone treated group, the interaction between the DeepSurv derived risk score and treatment modality was significant (HR for interaction 1.47, 95% CI from 1.32 to 1.65, p<0.005), suggesting the DeepSurv derived risk score predictive value specific to the immunotherapy. Principal component analysis and permutation importance analysis were performed as complementary analyses to assess individual genes associated with the DeepSurv derived risk score and identified ZFHX3, SMARCA4, ALK, BTK, and NOTCH2 as major contributors to survival risk stratification. Collectively. we suggested that nonlinear coupling pattern of 31 tumor gene mutation statuses in the DeepSurv model captures the heterogeneity of survival risk among nivolumab containing immunotherapy or nivolumab containing immunotherapy combined with chemotherapy treated patients with NSCLC which was visualized as clear separation between high risk and low risk groups divided by the median value of the risk scores.
Diaz, F. C.; Waldrup, B.; Carranza, F. G.; Manjarrez, S.; Velazquez-Villarreal, E.
Show abstract
Pancreatic ductal adenocarcinoma (PDAC) is an aggressive malignancy characterized by profound molecular heterogeneity and inconsistent responses to gemcitabine-based therapy. Although KRAS mutations are nearly ubiquitous, the broader RTK-RAS and MAPK signaling networks, and their association with therapeutic response, remain insufficiently characterized. We performed an integrative clinical-genomic study of 184 PDAC tumors, stratified by age at diagnosis and gemcitabine exposure, systematically evaluating somatic alterations within curated RTK-RAS/MAPK gene panels. Conversational artificial intelligence agents (AI-HOPE-RTK-RAS and AI-HOPE-MAPK) were deployed to dynamically construct cohorts and conduct pathway-level analyses, with results subsequently confirmed using conventional statistical approaches. Among late-onset PDAC cases, ERBB2 and RET mutations were significantly enriched in gemcitabine-treated tumors. In early-onset disease, CACNA2D family alterations were more common in untreated tumors, whereas FLNB and TP53 mutations were observed at higher frequencies in treated cases. Notably, late-onset patients who did not receive gemcitabine and lacked RTK-RAS or MAPK pathway alterations demonstrated significantly improved overall survival. These findings identify age- and treatment-specific signaling dependencies extending beyond canonical KRAS alterations and reinforce a precision oncology framework in PDAC. Conversational AI enabled rapid, multidimensional integration of clinical and genomic data, facilitating the identification of clinically meaningful pathway architectures.
Abolfathi, H.; Lamaze, F. C.; Maranda-Robitaille, M.; Pellerin, K.-A.; Joubert, D.; Armero, V. S.; Gaudreault, N.; Boudreau, D. K.; Orain, M.; Desmeules, P.; Gagne, A.; Yatabe, Y.; Bosse, Y.; Joubert, P.
Show abstract
IntroductionDespite advancements in non-small cell lung cancer (NSCLC) management through the use of molecular biomarkers, the recently introduced 9th edition of the TNM staging system remains based exclusively on anatomic descriptors, with no consistently demonstrated improvement in risk stratification for early-stage disease. This study explores the integration of a molecular prognostic classifier into the conventional TNM staging system. MethodsWe analyzed 502 patients with stage I-III lung adenocarcinoma (LUAD) who underwent surgical resection with tumor-based gene expression profiling at the Quebec Heart and Lung Institute. A molecular prognostic classifier was developed and integrated into the 9th edition TNM staging system to generate a novel model (TNMEx). Prognostic performance was compared with the 8th and 9th TNM editions using prognostic discrimination and reclassification metrics. External validation of the molecular classifier was performed in 271 LUAD cases from The Cancer Genome Atlas (TCGA). An independent cohort of 606 resected LUAD patients from the National Cancer Center Hospital (Tokyo) was used to externally compare the prognostic performance of the 8th and 9th TNM staging systems in the absence of molecular data. ResultsThe molecular prognostic classifier was developed based on the expression levels of 26 prognosis-associated genes, weighted by their corresponding coefficients. The classifier was subsequently integrated into the 9th edition TNM staging to generate the TNMEx model. The TNMEx system demonstrated superior prognostic performance, achieving a higher concordance index (C-index = 0.72) compared to the 9th edition TNM (C-index = 0.65, p=0.006). Moreover, TNMEx significantly improved patient risk reclassification compared to both the 8th (net reclassification improvement [NRI] = 0.27, integrated discrimination improvement [IDI] = 0.04) and 9th editions (NRI = 0.40, IDI = 0.05), underscoring its superior ability to stratify outcomes. The 8th and 9th editions showed only limited improvement in overall prognostic accuracy and risk stratification, as reflected by their relatively modest C-index values (0.62 and 0.65, respectively) and minimal reclassification gains (NRI = -0.06, IDI = 0.003). ConclusionsIncorporating a molecular-based prognostic model significantly enhanced the ability to recognize patients at high risk and to predict their survival outcomes more accurately than traditional TNM staging systems.
Karelin, A.; Brecht, I. B.; Pogoda, M.; Demidov, G.; Abele, M.; Schneider, D. T.; Aldea, D.; Etchevers, H. C.; Puig, S.; Hahn, M.; Forchhammer, S.
Show abstract
BackgroundDistinguishing benign proliferative nodules (PNs) from melanoma arising within congenital melanocytic nevi remains a major diagnostic challenge. Copy number alteration (CNA) analysis is widely used to support classification, but current criteria were developed using array comparative genomic hybridization (aCGH). The performance of alternative platforms such as shallow whole-genome sequencing (sWGS) and methylation arrays in this setting is poorly defined. ObjectivesThe objective of this study is to compare CNA profiles obtained from aCGH, sWGS, and methylation arrays in atypical nodules arising within congenital nevi, and to correlate these molecular findings with clinical outcomes. MethodsSixteen samples from fourteen patients were retrospectively analyzed using all three platforms. CNAs were cataloged, concordance across methods was quantified using the Jaccard index, and molecular classifications were compared. Clinical follow-up was reviewed to provide clinical context. ResultsaCGH detected 39 CNAs, sWGS 60, and methylation profiling 66. Concordance was highest between sWGS and methylation (mean Jaccard 0.67), followed by aCGH versus sWGS (0.64) and aCGH versus methylation (0.49). Cases with high aneuploidy demonstrated strong cross-platform agreement, whereas low-burden lesions exhibited greater variability between methods. Divergent molecular classifications were observed in six cases. ConclusionsWhile all methods reliably detect broad chromosomal changes, sWGS and methylation arrays identify many additional focal CNAs that may not align with CGH-based diagnostic criteria. Until platform-specific thresholds are established, aCGH remains the most conservative and clinically validated approach for evaluating proliferative nodules in congenital nevi. SIGNIFICANCEAccurate molecular classification of melanocytic proliferations in congenital nevi is essential but challenging, particularly in patients with multiple proliferative nodules. This study provides the first systematic comparison of aCGH, sWGS, and methylation-based CNA profiling in this setting. We show that higher-resolution platforms detect substantially more focal aberrations, which can lead to discordant and potentially overcalled malignancy assessments when applying CGH-derived criteria. Our findings highlight the need for platform-adapted diagnostic frameworks and support continued use of CGH as the most conservative and clinically validated method for risk stratification. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=118 HEIGHT=200 SRC="FIGDIR/small/26347388v1_ufig1.gif" ALT="Figure 1"> View larger version (27K): org.highwire.dtl.DTLVardef@1df3551org.highwire.dtl.DTLVardef@1256e50org.highwire.dtl.DTLVardef@6d8660org.highwire.dtl.DTLVardef@911b4f_HPS_FORMAT_FIGEXP M_FIG C_FIG
Jeong, W. C.; Kim, H. H.; Hwang, Y.; Hwang, G.; Kim, K.; Ko, Y. S.
Show abstract
The Updated Sydney System (USS) provides a standardized framework for grading gastritis and stratifying gastric cancer risk. However, subjective observer variability and labor-intensive workflows impede its routine clinical use. To address these challenges, we developed SydneyMTL, a multi-task deep learning framework that uses Multiple Instance Learning (MIL) with task-specific attention pooling to predict severity grades across all five USS attributes simultaneously. Trained on an unprecedented cohort of 50,765 whole-slide images (WSIs), SydneyMTL generates interpretable histologic evidence for clinical practice. In retrospective evaluations against 24 board-certified pathologists, the model achieved an overall mean lenient accuracy of 89.1%, with 22 pathologists exhibiting >80% agreement with the model. When evaluated on an expert-adjudicated "Golden dataset," the models performance improved to 90.2%, demonstrating its capacity to align with multi-expert consensus and filter individual annotator noise. Latent space analysis confirmed that SydneyMTL captures the ordinal structure of the USS, by representing disease severity as a continuous biological spectrum rather than as disjoint categories. Finally, a randomized crossover reader study showed that AI-assisted review significantly reduced interpretation time and improved inter-observer agreement, establishing SydneyMTL as a scalable tool for supporting standardized gastric cancer risk stratification. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=154 HEIGHT=200 SRC="FIGDIR/small/26346304v1_ufig1.gif" ALT="Figure 1"> View larger version (66K): org.highwire.dtl.DTLVardef@8890daorg.highwire.dtl.DTLVardef@1de007dorg.highwire.dtl.DTLVardef@1f243d1org.highwire.dtl.DTLVardef@425eb9_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LISydneyMTL is the first unified framework to simultaneously predict the full 4-tier severity grades across all five Updated Sydney System attributes. C_LIO_LITrained on a massive cohort of 50,765 whole slide images, the model aligns with multi-expert consensus on a rigorous "Golden dataset". C_LIO_LIAI assistance significantly reduces pathologist reading time and harmonizes inter-observer variability in real-world clinical workflows. C_LIO_LILatent space analysis confirms that SydneyMTL preserves the biological ordinality of disease severity without explicit ordinal constraints. C_LI The bigger pictureGastritis is among the most frequent diagnoses in gastrointestinal pathology, and its histologic severity is central to gastric cancer prevention. In routine practice, pathologists convert subtle mucosal changes into semi-quantitative, ordinal grades using the Updated Sydney System, which evaluates five co-existing histologic dimensions. While this framework provides a shared language, grading is labor intensive and inherently dependent on reader-specific thresholds, creating variability that affects risk stratification and surveillance. A key concept motivating our study is that gastritis is not defined by a single finding but by multiple criteria that co-occur and interact. This suggests that computational models should learn these criteria jointly - capturing their biological correlations and the continuum of severity - rather than treating each grade as an isolated classification task. SydneyMTL implements this perspective through a unified multi-task, weakly supervised approach that learns directly from a massive cohort of 50,765 routine whole-slide images. Beyond diagnostic accuracy, our work reveals that the model preserves the ordinality of severity in its representation space, supporting the biological view that discrete clinical categories approximate an underlying continuous biological spectrum. Its attention-based explanations also connect model outputs to interpretable tissue evidence, enhancing clinical trust. Crucially, by harmonizing inter-observer variability, SydneyMTL provides a more reliable foundation for gastric cancer risk assessment, ensuring that premalignant changes are captured with greater consistency. More broadly, our findings reposition AI for gastritis from narrow detection toward scalable, evidence-based decision support that can standardize grading practices and reduce cognitive burden on the global pathology workforce.
Somer, J.; Benor, G.; Alpert, A.; Perets, R.; Mannor, S.
Show abstract
A recent randomized clinical trial in non-small cell lung cancer1 confirms what numerous observational studies have reported time of day (ToD) may dramatically influence treatment outcomes in cancer patients. In this recent trial median overall survival (OS) decreased from 28 months in the early ToD arm to 16.8 months in the late ToD arm. We raise the concern that clinical trial outcomes may be influenced by seemingly minor biases in treatment time across arms. We also suggest that by measuring or randomizing treatment-time in clinical trials, we may identify beneficial ToD dependent treatments that would otherwise be overlooked.